{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Llama-3.2-1B Demo\n",
    "\n",
    "In this demo, you'll see a few of the attribution graphs found with Llama-3.2-1B. Below, you'll find examples, as well as a number of intervention experiments that will prove the correctness of our annotations / supernodes found in the graph."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are some example graphs. We present each prompt formatted as `input` → `expected output`:\n",
    "- [`Because the manager retired the project` → `, / was`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-garden-path-nps&pinnedIds=17_11_6%2C17_574_6%2CE_2447_6%2C12_123901_6%2C12_105590_6%2C10_117209_4%2CE_22311_4%2C8_3291_1%2CE_18433_1%2C3_33599_1%2C2_50381_1%2C2_60861_1%2C3_9634_1%2C9_102504_6%2C10_117209_6&clickedId=9_102504_6&supernodes=%5B%5B%22because%22%2C%222_50381_1%22%2C%228_3291_1%22%2C%223_9634_1%22%2C%222_60861_1%22%2C%223_33599_1%22%5D%2C%5B%22subject+nouns%22%2C%2212_105590_6%22%2C%229_102504_6%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99&clerps=%5B%5B%2210_10117209_4%22%2C%22End+of+subordinate+clause%22%5D%2C%5B%2210_10117209_6%22%2C%22End+of+subordinate+clause%22%5D%2C%5B%2212_12123901_6%22%2C%22End+of+subordinate+clause%22%5D%5D)\n",
    "- [`Zagreb:Croatia :: Copenhagen:` → `Denmark`](<https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-croatia-denmark&pinnedIds=17_24539_9%2C17_35440_9%2C11_121939_9%2C10_89787_9%2C8_62013_8%2C7_21695_8%2C12_79838_9%2C13_99984_9%2C14_3293_9%2C13_50480_9%2C14_95110_9%2C15_95091_9%2CE_64161_8%2C5_29307_8%2C2_58224_8%2C1_75211_8%2C3_99868_8%2C11_129365_9%2C11_61135_9%2C11_121939_8%2C12_57025_9%2C11_69725_9%2C11_69725_8%2C9_39864_9%2C12_127919_9%2C10_108437_9%2C10_76757_9%2C9_85946_9%2CE_25_9%2C8_48318_6%2C2_75517_6%2C1_74922_6%2C3_17431_6%2C2_71193_6%2CE_18283_5%2CE_689_6%2C0_88159_6%2C1_37886_6%2C11_90929_9%2C10_89787_8%2C12_127919_8%2C9_39864_8%2C11_129365_8&supernodes=%5B%5B%22Scandinavia%22%2C%2215_95091_9%22%2C%2213_99984_9%22%2C%2214_3293_9%22%2C%2211_121939_9%22%2C%2211_129365_9%22%5D%2C%5B%22Commas+separating+cities+and+states%22%2C%2210_108437_9%22%2C%2211_61135_9%22%2C%2210_76757_9%22%2C%229_85946_9%22%5D%2C%5B%22Countries%22%2C%222_71193_6%22%2C%228_48318_6%22%2C%223_17431_6%22%2C%221_74922_6%22%2C%220_88159_6%22%2C%221_37886_6%22%2C%222_75517_6%22%5D%2C%5B%22upweight+demonyms%22%2C%2212_57025_9%22%2C%2211_90929_9%22%5D%2C%5B%22Scandinavia%22%2C%2210_89787_9%22%2C%2211_129365_8%22%2C%2210_89787_8%22%2C%2211_121939_8%22%2C%228_62013_8%22%2C%227_21695_8%22%2C%225_29307_8%22%2C%223_99868_8%22%2C%221_75211_8%22%2C%222_58224_8%22%5D%2C%5B%22Denmark%22%2C%2211_69725_8%22%2C%229_39864_8%22%2C%2212_127919_8%22%5D%2C%5B%22Denmark%22%2C%229_39864_9%22%2C%2214_95110_9%22%2C%2211_69725_9%22%2C%2212_127919_9%22%2C%2212_79838_9%22%2C%2213_50480_9%22%5D%2C%5B%22Output+%5C%22Denmark%5C%22%22%2C%2217_35440_9%22%2C%2217_24539_9%22%5D%5D&clickedId=17_24539_9&pruningThreshold=0.7&densityThreshold=0.99>)\n",
    "- [`Fait: Michael Jordan joue au` → `(basket)`](<https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-michael-jordan-fr&clerps=%5B%5B%2211090659%22,%22sports%22%5D%5D&pinnedIds=17_14351_8,17_19794_8,13_84984_8,11_90659_8,12_99181_8,13_60715_8,10_126099_8,10_126099_5,11_90659_7,11_64053_8,12_15278_8,9_8897_8,5_103377_8,5_103377_5,3_31856_5,4_70880_5,4_5148_5,4_90298_5,3_64167_5,2_130928_5,2_114643_5,2_109470_5,1_8130_5,0_19479_5,0_73223_5,E_17527_5,E_8096_4,15_8246_8,15_120719_8,13_130501_8,14_62649_8,12_94888_8,11_12581_8,12_82732_8,10_48389_8,9_96739_8,8_76812_8,8_93624_8,7_1802_8,5_114658_8,5_32065_8,2_7182_8,3_111642_8,1_18095_8,E_8065_8,E_68_7,11_73277_8,11_73277_7&supernodes=%5B%5B%22%5C%22Michael%5C%22+celebrities%22,%220_73223_5%22,%221_8130_5%22%5D,%5B%22Output+basketball%22,%2217_19794_8%22,%2217_14351_8%22%5D,%5B%22basketball%22,%2213_84984_8%22,%2210_126099_8%22%5D,%5B%22celebrities/public+figures%22,%222_130928_5%22,%224_70880_5%22,%222_114643_5%22,%220_19479_5%22%5D,%5B%22sports%22,%229_8897_8%22,%2211_64053_8%22,%2212_15278_8%22,%225_103377_8%22,%2212_99181_8%22,%2213_60715_8%22,%2211_90659_8%22%5D,%5B%22basketball%22,%222_109470_5%22,%2210_126099_5%22,%224_90298_5%22%5D,%5B%22French%22,%2211_73277_8%22,%2212_94888_8%22,%2211_73277_7%22%5D,%5B%22French+articles%22,%2212_82732_8%22,%2214_62649_8%22,%2213_130501_8%22,%2215_8246_8%22,%2215_120719_8%22%5D,%5B%22romance/foreign+language+articles%22,%225_114658_8%22,%227_1802_8%22,%2211_12581_8%22,%229_96739_8%22,%2210_48389_8%22%5D,%5B%22non-English+(broken+formatting)+text%22,%228_76812_8%22,%221_18095_8%22,%222_7182_8%22,%223_111642_8%22,%225_32065_8%22,%228_93624_8%22%5D,%5B%22sports%22,%2211_90659_7%22,%223_31856_5%22,%223_64167_5%22,%225_103377_5%22,%224_5148_5%22%5D%5D&clickedId=11_90659_7&pruningThreshold=0.7&densityThreshold=0.99>)\n",
    "- [`The International Advanced Security Group (IAS` → `G`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-iasg&clerps=%5B%5B%228098486%22%2C%22tokens+starting+with+G%22%5D%2C%5B%2210021529%22%2C%22tokens+starting+with+G+%28and+those+after+them%29%22%5D%2C%5B%229046963%22%2C%22%5C%22International+Institute%5C%22%22%5D%5D&pinnedIds=17_38_7%2C10_21529_5%2CE_32922_7%2C14_28630_7%2C14_114483_7%2C8_98486_5%2C13_75472_7%2CE_5856_5%2C12_74956_7%2C8_63901_5%2C10_80084_5%2C13_35252_7%2C9_88649_5%2C0_44874_5%2C10_66863_7%2C13_104558_7%2C12_115969_7%2C10_94711_7%2C3_112354_7%2C2_89312_7%2C0_83171_7%2C3_106012_7%2C10_74154_7%2C9_46963_7%2C11_80990_7%2C11_20594_7%2C12_112311_7%2C13_19034_7&supernodes=%5B%5B%22group%22%2C%2210_80084_5%22%2C%228_63901_5%22%2C%220_44874_5%22%2C%229_88649_5%22%5D%2C%5B%22all+caps+in+%28parenthetical%29+acronyms+%28first+token%29%22%2C%2210_66863_7%22%2C%2213_35252_7%22%2C%2212_74956_7%22%5D%2C%5B%22Upweight+token+starting+with+G%22%2C%2213_75472_7%22%2C%2214_28630_7%22%5D%2C%5B%22acronyms+starting+with+I%22%2C%2211_20594_7%22%2C%2212_112311_7%22%2C%2213_19034_7%22%2C%2211_80990_7%22%2C%2210_74154_7%22%5D%2C%5B%22caps+in+acronyms%22%2C%220_83171_7%22%2C%222_89312_7%22%2C%223_106012_7%22%2C%2210_94711_7%22%5D%2C%5B%22capital+letters%22%2C%223_112354_7%22%2C%2214_114483_7%22%2C%2213_104558_7%22%2C%2212_115969_7%22%5D%2C%5B%22tokens+starting+with+G%22%2C%228_98486_5%22%2C%2210_21529_5%22%5D%5D&clickedId=9_46963_7&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "- [`The keys on the cabinet` → `are`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-keys-cabinet&pinnedIds=17_1051_5%2C17_617_5%2C17_527_5%2C11_111069_5%2C11_111069_2%2CE_22685_5%2CE_7039_2%2C10_97308_2%2C8_69643_2%2C8_54079_3%2C10_122809_5%2C8_54079_4%2C6_39395_2%2C8_33998_2%2C9_119643_2&supernodes=%5B%5B%22output+plural+verb%22%2C%2217_617_5%22%2C%2217_1051_5%22%2C%2217_527_5%22%5D%2C%5B%22ends+of+plural+subject+NPs+%2F+predict+plural+verb%22%2C%2210_97308_2%22%2C%2211_111069_2%22%2C%229_119643_2%22%5D%2C%5B%22ends+of+plural+subject+NPs+%2F+predict+plural+verb%22%2C%2211_111069_5%22%2C%2210_122809_5%22%5D%2C%5B%22prepositions+after+plural+nouns%22%2C%228_54079_4%22%2C%228_54079_3%22%5D%2C%5B%22last+token+of+plural+nouns%22%2C%228_33998_2%22%2C%228_69643_2%22%2C%226_39395_2%22%5D%5D&densityThreshold=0.99)\n",
    "- [`The girls that the teacher sees` → `are`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-girls-are&clerps=%5B%5B%2211111069%22%2C%22ends+of+subject+NPs+%28upweight+%5C%22are%5C%22%29%22%5D%2C%5B%2210085650%22%2C%22ends+of+NPs+that+contain+a+verb%22%5D%5D&pinnedIds=17_527_6%2CE_16008_6%2C11_111069_2%2CE_7724_2%2C10_85650_6%2C8_87538_6%2C1_41984_2%2C0_82697_2%2C7_25556_6%2C2_74195_6%2CE_430_3%2C3_10135_6%2C5_113973_6&supernodes=%5B%5B%22girls%22%2C%220_82697_2%22%2C%221_41984_2%22%5D%2C%5B%22ends+of+relative+clauses+%28CPs+without+%5C%22that%5C%22%29%22%2C%222_74195_6%22%2C%223_10135_6%22%2C%225_113973_6%22%2C%227_25556_6%22%2C%228_87538_6%22%5D%5D&clickedId=10_85650_6&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "- [`The girl that the teacher sees` → `is`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-girl-is&clerps=%5B%5B%2210085650%22%2C%22ends+of+NPs+that+contain+a+verb%22%5D%2C%5B%2282697%22%2C%22girls%2Fboys%22%5D%2C%5B%221111111%22%2C%22girl%2Fboy%22%5D%2C%5B%222083888%22%2C%22girls+%2F+boys%22%5D%2C%5B%22102096%22%2C%22that%22%5D%2C%5B%2288273%22%2C%22that%22%5D%2C%5B%221009269%22%2C%22verbs+ending+NPs+modified+by+RRCs%22%5D%2C%5B%227093286%22%2C%22potential+verbs+ending+NPs+modified+by+RCs%22%5D%2C%5B%226051995%22%2C%22verbs+ending+NPs+modified+by+RRCs%22%5D%5D&pinnedIds=17_374_6%2CE_16008_6%2C10_85650_6%2C8_37248_6%2C7_25556_6%2C9_1769_6%2C2_74195_6%2CE_430_3%2C3_10135_6%2C5_113973_6%2C0_128786_6%2CE_3828_2%2C0_82697_2%2C5_18_6%2C1_69168_6%2C1_111111_2%2C2_83888_2%2C7_93286_6%2C6_51995_6%2C1_9269_6%2C0_88273_3%2C0_102096_3%2C2_90664_3&supernodes=%5B%5B%22see%22%2C%221_69168_6%22%2C%220_128786_6%22%2C%229_1769_6%22%5D%2C%5B%22girls+%2F+boys%22%2C%222_83888_2%22%2C%221_111111_2%22%2C%220_82697_2%22%5D%2C%5B%22Potential+ends+of+multi-token+NPs+%28predict+verb%29%22%2C%227_93286_6%22%2C%226_51995_6%22%2C%228_37248_6%22%2C%2210_85650_6%22%2C%227_25556_6%22%2C%225_113973_6%22%2C%223_10135_6%22%2C%222_74195_6%22%2C%225_18_6%22%5D%2C%5B%22that%22%2C%222_90664_3%22%2C%220_102096_3%22%2C%220_88273_3%22%5D%5D&clickedId=2_74195_6)\n",
    "- [`Hecho: Michael Jordan juega al` → `baloncesto`](<https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-michael-jordan-es&clerps=[[%2211090659%22,%22sports%22],[%228127385%22,%22championship+/+sports%22],[%229021790%22,%22play+/+games%22],[%2211001268%22,%22non-english+languages%22]]&pinnedIds=17_9839_8%2C13_84984_8%2C12_2241_8%2C13_60715_8%2CE_453_8%2C12_99181_8%2C10_126099_8%2C11_90659_8%2C10_126099_5%2C12_15278_8%2C11_12581_8%2C11_64053_8%2C8_117275_5%2C5_103377_5%2C8_127385_5%2C8_127385_8%2C9_96739_8%2C8_127385_7%2C2_109470_5%2C4_70880_5%2C3_31856_5%2CE_17527_5%2CE_8096_4%2C11_1268_8%2C10_88209_8%2C9_21790_8%2C8_80913_8%2CE_48050_6%2CE_6885_7%2C2_7182_8%2C5_32065_8%2C3_111642_8%2C5_114658_8%2C7_1802_8%2C8_93624_8%2C8_76812_8%2C10_48389_8%2C2_60009_8%2C12_104936_8%2C13_103738_8%2C3_118311_5%2C5_67733_5%2C13_31291_8%2C13_62591_8&supernodes=[[%22sports%22,%228_127385_8%22,%2210_88209_8%22,%2213_60715_8%22,%2212_15278_8%22,%2212_99181_8%22,%2212_2241_8%22,%2211_90659_8%22,%2211_64053_8%22],[%22sports%22,%223_31856_5%22,%225_103377_5%22,%228_127385_5%22],[%22basketball-related+content%22,%228_117275_5%22,%2210_126099_5%22,%222_109470_5%22],[%22famous+athletes/people%22,%224_70880_5%22,%225_67733_5%22,%223_118311_5%22],[%22basketball%22,%2213_62591_8%22,%2213_84984_8%22,%2210_126099_8%22],[%22Spanish%22,%2213_31291_8%22,%2213_103738_8%22,%2212_104936_8%22],[%22play/games%22,%222_60009_8%22,%228_80913_8%22,%229_21790_8%22],[%22romance/foreign+language+articles+/+foreign+text%22,%228_76812_8%22,%225_32065_8%22,%223_111642_8%22,%222_7182_8%22,%228_93624_8%22,%2211_1268_8%22,%227_1802_8%22,%229_96739_8%22,%225_114658_8%22,%2211_12581_8%22,%2210_48389_8%22]]&clickedId=play/ga&pruningThreshold=0.7&densityThreshold=0.99>)\n",
    "- [`Mexico: Spanish :: US:` → `English`](<https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-Spanish-English&clerps=[[%2212020491%22,%22UK/English%22],[%2213010254%22,%22languages+(upweight+English)%22],[%2211015631%22,%22language%22],[%2212057025%22,%22language%22]]&pinnedIds=17_6498_6%2C11_105416_6%2C9_58293_3%2C10_2257_3%2C9_49585_3%2C5_6113_3%2CE_15506_3%2C10_813_6%2CE_2326_5%2C2_37574_5%2C4_75078_5%2C13_10254_6%2C11_97996_6%2C12_57025_6%2C12_20491_6%2C11_15631_6%2C0_74386_5&supernodes=[[%22language%22,%225_6113_3%22,%229_58293_3%22,%229_49585_3%22,%2210_2257_3%22],[%22languages+(upweight+English)%22,%2211_97996_6%22,%2211_105416_6%22],[%22US%22,%220_74386_5%22,%2210_813_6%22,%222_37574_5%22,%224_75078_5%22],[%22language%22,%2211_15631_6%22,%2212_57025_6%22]]&clickedId=11_15631_6&pruningThreshold=0.7&densityThreshold=0.99>)\n",
    "- [`Mexico: peso :: US:` → `dollar`](<https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-peso-dollar&clerps=%5B%5B%2211038164%22%2C%22money%22%5D%2C%5B%229093759%22%2C%22units%22%5D%2C%5B%2212022091%22%2C%22US%22%5D%5D&pinnedIds=17_18160_6%2C17_43464_6%2C17_11441_6%2C17_20121_6%2C12_116632_6%2C11_16313_6%2C11_42240_6%2C12_84127_6%2C10_60314_6%2C9_18536_6%2C8_62866_3%2C10_28752_6%2C4_1248_5%2C4_1248_4%2C9_93759_6%2C3_111633_3%2C2_128385_3%2C2_48203_3%2CE_38035_3%2CE_2326_5%2C10_813_6%2C2_37574_5%2C4_75078_5%2C0_74386_5%2C1_66436_5%2C11_38164_6%2C13_84235_6%2C12_22091_6%2C13_58987_6%2C11_67778_6%2C15_118166_6%2C14_98297_6&supernodes=%5B%5B%22Output+%5C%22+dollar%5C%22%22%2C%2217_18160_6%22%2C%2217_43464_6%22%2C%2217_11441_6%22%2C%2217_20121_6%22%5D%2C%5B%22US+%5Bpos%3A+US%5D%22%2C%224_75078_5%22%2C%222_37574_5%22%2C%221_66436_5%22%2C%220_74386_5%22%5D%2C%5B%22currency+%2F+finance+%2F+economics%22%2C%222_48203_3%22%2C%223_111633_3%22%2C%228_62866_3%22%2C%222_128385_3%22%5D%2C%5B%22currency%22%2C%224_1248_4%22%2C%224_1248_5%22%5D%2C%5B%22upweight+dollar%22%2C%2213_84235_6%22%2C%2212_84127_6%22%2C%2211_42240_6%22%2C%2211_16313_6%22%2C%2212_116632_6%22%2C%2210_60314_6%22%5D%2C%5B%22downweight+dollar%22%2C%2213_58987_6%22%2C%2214_98297_6%22%2C%2215_118166_6%22%5D%2C%5B%22US%22%2C%2212_22091_6%22%2C%2211_67778_6%22%2C%2210_813_6%22%5D%2C%5B%22currency%22%2C%2211_38164_6%22%2C%229_93759_6%22%2C%229_18536_6%22%2C%2210_28752_6%22%5D%5D&clickedId=12_22091_6&pruningThreshold=0.7&densityThreshold=0.99>)\n",
    "- [`Mexico: Spanish :: France: French :: Thailand:` → `Thai`](<https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-Spanish-French-Thai&pinnedIds=17_27490_10%2C11_63944_10%2C11_105416_10%2C11_15631_10%2C11_116644_10%2C11_63944_9%2C13_10254_10%2C10_115816_10%2C9_101986_10%2C8_127767_9%2C4_55772_9%2C3_122795_9%2CE_30567_9%2C2_37737_9%2C10_2257_10%2C9_58293_7%2C9_58293_3%2CE_8753_7%2C5_6113_7%2C5_6113_3%2C9_125851_10%2C8_74127_7%2CE_15506_3%2C0_66748_7&supernodes=%5B%5B%22languages%2Fdemonyms%22%2C%229_58293_3%22%2C%225_6113_3%22%5D%2C%5B%22languages%2Fdemonyms%22%2C%225_6113_7%22%2C%220_66748_7%22%2C%228_74127_7%22%2C%229_58293_7%22%5D%2C%5B%22languages+%28pos%3A+Thailand%29%22%2C%2211_15631_10%22%2C%2210_115816_10%22%2C%229_125851_10%22%2C%2210_2257_10%22%2C%2213_10254_10%22%2C%2211_105416_10%22%5D%2C%5B%22Thailand%2FSEA+%28pos%3AThailand%29%22%2C%2211_63944_9%22%2C%228_127767_9%22%2C%224_55772_9%22%2C%223_122795_9%22%2C%222_37737_9%22%5D%2C%5B%22Thailand%2FSEA+%28pos%3A+%3A%29%22%2C%2211_63944_10%22%2C%2211_116644_10%22%2C%229_101986_10%22%5D%5D&clickedId=Thailand%2FSEA+%28%3A%29&pruningThreshold=0.7&densityThreshold=0.99>)\n",
    "- [`Mexico: Spanish :: Thailand:` → `Thai`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-Spanish-Thai&clickedId=13_10254_6&pinnedIds=17_27490_6%2C11_105416_6%2C11_63944_6%2C13_10254_6%2C11_63944_5%2C12_57025_6%2C11_97996_6%2CE_15506_3%2C11_15631_6%2C11_116644_6%2C9_58293_3%2C10_115816_6%2C8_127767_5%2C9_101986_5%2CE_30567_5%2C4_55772_5%2C2_37737_5%2C3_122795_5%2C10_2257_6%2C5_6113_3%2C10_2257_3%2C9_49585_3%2C4_103160_3%2C3_28630_3%2C1_30074_3%2C2_88127_3%2C0_66748_3&supernodes=%5B%5B%22Spanish%22%2C%223_28630_3%22%2C%222_88127_3%22%2C%221_30074_3%22%5D%2C%5B%22demonyms%22%2C%220_66748_3%22%2C%224_103160_3%22%2C%225_6113_3%22%2C%229_49585_3%22%2C%2210_2257_3%22%2C%229_58293_3%22%5D%2C%5B%22Thailand%22%2C%2211_63944_5%22%2C%229_101986_5%22%2C%228_127767_5%22%2C%224_55772_5%22%2C%223_122795_5%22%2C%222_37737_5%22%5D%2C%5B%22Thailand%22%2C%2211_116644_6%22%2C%2211_63944_6%22%5D%2C%5B%22language%22%2C%2210_2257_6%22%2C%2210_115816_6%22%2C%2211_15631_6%22%5D%2C%5B%22languages+%28upweight+English%2C+Spanish%29%22%2C%2213_10254_6%22%2C%2211_105416_6%22%2C%2211_97996_6%22%2C%2212_57025_6%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "- [`Mexico: peso :: Thailand:` → `ba[ht]`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-peso-baht&pinnedIds=17_13081_6%2C17_14659_6%2C12_82622_6%2C11_16313_6%2C11_42240_6%2C11_63944_6%2C9_18536_6%2C11_38164_6%2C10_60314_6%2C11_116644_6%2C9_101986_6%2C8_127767_5%2C4_55772_5%2C9_101986_5%2C11_53983_6%2C9_14974_6%2C8_62866_3%2C10_28752_6%2C4_1248_3%2C3_111633_3%2CE_38035_3%2C3_122795_5%2C2_37737_5%2C2_18590_5%2CE_30567_5%2C2_75517_5%2C2_128385_3%2C2_48203_3%2C1_72579_3&supernodes=%5B%5B%22Output+%5C%22+ba%5Bht%5D%5C%22%22%2C%2217_14659_6%22%2C%2217_13081_6%22%5D%2C%5B%22Thailand%22%2C%2211_63944_6%22%2C%2211_116644_6%22%2C%229_101986_6%22%5D%2C%5B%22Thailand%2FSEA%22%2C%222_18590_5%22%2C%222_75517_5%22%2C%222_37737_5%22%2C%223_122795_5%22%2C%224_55772_5%22%2C%228_127767_5%22%2C%229_101986_5%22%5D%2C%5B%22currency%22%2C%229_14974_6%22%2C%229_18536_6%22%2C%2210_60314_6%22%2C%2211_16313_6%22%2C%2211_42240_6%22%2C%2210_28752_6%22%2C%2211_53983_6%22%2C%2212_82622_6%22%2C%2211_38164_6%22%5D%2C%5B%22currency%22%2C%222_48203_3%22%2C%221_72579_3%22%2C%222_128385_3%22%2C%228_62866_3%22%2C%224_1248_3%22%2C%223_111633_3%22%5D%5D&clickedId=2_48203_3&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "- [`Mexico: Spanish :: China:` → `Chinese / Mandarin`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-Spanish-Chinese&clerps=%5B%5B%2213010254%22%2C%22languages+%28upweight+English%29%22%5D%2C%5B%2211015631%22%2C%22language%22%5D%2C%5B%2212057025%22%2C%22language%22%5D%5D&pinnedIds=17_8620_6%2C17_83871_6%2C11_105416_6%2C13_10254_6%2C12_57025_6%2C11_97996_6%2CE_15506_3%2C11_15631_6%2C9_58293_3%2C10_115816_6%2C2_75517_3%2C8_48318_3%2C0_66748_3%2C11_57318_6%2C8_87687_5%2C6_37297_5%2C4_123730_5%2C4_36913_5%2CE_5734_5%2C2_10764_5%2C13_125862_6%2C11_121863_6%2C8_74127_3%2C10_2257_6%2C10_2257_3%2C9_49585_3%2C5_6113_3%2C4_103160_3%2C3_28630_3%2C1_30074_3%2C2_88127_3%2C8_86003_3%2C8_124301_3%2C2_27793_3%2C4_2682_3%2C2_116265_3%2C2_106637_3%2C4_81660_3%2C3_16228_3%2C3_52190_3%2C0_17320_3%2C2_10130_1&supernodes=%5B%5B%22Output+%5C%22+Chinese%5C%22+%28p%3D0.198%29%22%2C%2217_8620_6%22%2C%2217_83871_6%22%5D%2C%5B%22China%22%2C%228_87687_5%22%2C%226_37297_5%22%2C%224_123730_5%22%2C%222_10764_5%22%2C%224_36913_5%22%5D%2C%5B%22China%22%2C%2211_121863_6%22%2C%2213_125862_6%22%2C%2211_57318_6%22%5D%2C%5B%22languages+%28upweight+English%2C+Spanish%29%22%2C%2210_115816_6%22%2C%2210_2257_6%22%2C%2211_105416_6%22%2C%2213_10254_6%22%2C%2211_97996_6%22%2C%2211_15631_6%22%2C%2212_57025_6%22%5D%2C%5B%22Spanish%22%2C%222_10130_1%22%2C%223_16228_3%22%2C%222_106637_3%22%2C%223_28630_3%22%2C%222_88127_3%22%2C%221_30074_3%22%2C%223_52190_3%22%2C%224_81660_3%22%5D%2C%5B%22language%22%2C%228_124301_3%22%2C%2210_2257_3%22%2C%229_49585_3%22%2C%224_103160_3%22%2C%225_6113_3%22%2C%228_86003_3%22%2C%222_27793_3%22%2C%220_17320_3%22%5D%2C%5B%22Countries%22%2C%224_2682_3%22%2C%222_116265_3%22%2C%228_74127_3%22%2C%229_58293_3%22%2C%222_75517_3%22%2C%228_48318_3%22%2C%220_66748_3%22%5D%5D&clickedId=2_10130_1)\n",
    "- [`Mexico: peso :: China:` → `yuan`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-peso-yuan&clerps=%5B%5D&pinnedIds=17_73383_6%2C11_57318_6%2C13_125862_6%2C11_16313_6%2C12_116632_6%2C11_42240_6%2C11_121863_6%2C11_57318_5%2C12_82622_6%2C8_87687_5%2C4_123730_5%2C2_10764_5%2C4_36913_5%2C6_37297_5%2CE_5734_5%2C10_60314_6%2C9_18536_6%2C9_14974_6%2C8_62866_3%2C4_1248_3%2C10_28752_6%2C3_111633_3%2CE_38035_3%2C4_1248_4%2C2_128385_3%2C2_48203_3%2C1_72579_3%2C10_63648_6%2C9_42751_5%2C10_70313_5%2C2_52306_5%2C7_96333_5%2C5_109443_5%2C4_5554_5%2C3_85563_5%2C2_40398_5%2C0_124580_5&supernodes=%5B%5B%22currency%22%2C%228_62866_3%22%2C%224_1248_3%22%2C%223_111633_3%22%2C%222_48203_3%22%2C%221_72579_3%22%2C%222_128385_3%22%2C%224_1248_4%22%5D%2C%5B%22currency%22%2C%2211_42240_6%22%2C%229_18536_6%22%2C%2212_82622_6%22%2C%2212_116632_6%22%2C%2211_16313_6%22%2C%2210_60314_6%22%2C%229_14974_6%22%2C%2210_28752_6%22%5D%2C%5B%22China%22%2C%2210_63648_6%22%2C%2213_125862_6%22%2C%2211_121863_6%22%2C%2211_57318_6%22%5D%2C%5B%22China%22%2C%225_109443_5%22%2C%224_5554_5%22%2C%227_96333_5%22%2C%2210_70313_5%22%2C%222_52306_5%22%2C%229_42751_5%22%2C%2211_57318_5%22%2C%228_87687_5%22%2C%222_10764_5%22%2C%224_123730_5%22%2C%224_36913_5%22%2C%226_37297_5%22%2C%223_85563_5%22%2C%222_40398_5%22%2C%220_124580_5%22%5D%5D&clickedId=2_10764_5)\n",
    "- [`Mexico: peso :: Europe:` → `Euro`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-peso-euro&pinnedIds=17_20026_6%2C17_18140_6%2C11_16313_6%2C11_42240_6%2C12_116632_6%2C11_68469_6%2CE_38035_3%2C8_120350_5%2C2_110470_5%2CE_4606_5%2C10_86823_6%2C9_60773_5%2C10_125848_5%2C7_81969_5%2C8_25362_5%2C0_32561_5%2C1_49441_5%2C0_54175_5%2C0_80571_5%2C10_60314_6%2C9_18536_6%2C8_62866_3%2C4_1248_3%2C3_111633_3%2C9_14974_6%2C4_1248_4%2C7_73270_3%2C6_128487_3%2C2_128385_3%2C2_48203_3%2C12_59559_6%2C13_50738_6&supernodes=%5B%5B%22Output+%5C%22+euro%5C%22+%28p%3D0.447%29%22%2C%2217_18140_6%22%2C%2217_20026_6%22%5D%2C%5B%22Europe%22%2C%228_25362_5%22%2C%2210_125848_5%22%2C%2210_86823_6%22%2C%228_120350_5%22%2C%227_81969_5%22%2C%229_60773_5%22%2C%222_110470_5%22%2C%221_49441_5%22%2C%220_32561_5%22%2C%220_54175_5%22%2C%220_80571_5%22%5D%2C%5B%22currency%22%2C%228_62866_3%22%2C%224_1248_4%22%2C%222_128385_3%22%2C%222_48203_3%22%2C%223_111633_3%22%2C%224_1248_3%22%2C%226_128487_3%22%2C%227_73270_3%22%5D%2C%5B%22currency+%2F+upweight+dollar+%2F+cent%22%2C%229_18536_6%22%2C%229_14974_6%22%2C%2211_16313_6%22%2C%2210_60314_6%22%2C%2211_42240_6%22%2C%2212_116632_6%22%5D%2C%5B%22EU%22%2C%2212_59559_6%22%2C%2213_50738_6%22%2C%2211_68469_6%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "- [`grass: green sky: blue corn: yellow carrot: orange strawberry:` → `red`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-strawberry-red&clerps=%5B%5B%2210021529%22%2C%22tokens+starting+with+G+%28and+those+after+them%29%22%5D%2C%5B%2212076178%22%2C%22red%22%5D%2C%5B%2210004161%22%2C%22red%22%5D%2C%5B%229086169%22%2C%22strawberry%2Fcherry%2Fberry%22%5D%5D&pinnedIds=17_2579_14%2C12_65335_14%2C12_83896_14%2C12_76178_14%2C10_126188_14%2CE_25_14%2CE_19087_12%2CE_14071_9%2C9_70369_14%2C11_66261_14%2CE_6437_6%2C9_78428_14%2C10_4161_13%2CE_73700_13%2C9_86169_13%2C4_36203_12%2C8_90102_12%2C8_90102_9%2C9_85104_14%2C5_93398_13%2C12_65335_11%2C12_65335_8%2C8_90102_13%2C10_126188_13&clickedId=9_86169_13&supernodes=%5B%5B%22upweight+colors%22%2C%229_85104_14%22%2C%2212_83896_14%22%2C%2211_66261_14%22%2C%229_70369_14%22%2C%2210_126188_14%22%2C%229_78428_14%22%2C%2212_65335_14%22%5D%2C%5B%22colors+%28earlier+positions%29%22%2C%2212_65335_11%22%2C%2212_65335_8%22%2C%228_90102_9%22%2C%228_90102_12%22%2C%224_36203_12%22%5D%2C%5B%22colors+%28strawberry+position%29%22%2C%2210_126188_13%22%2C%225_93398_13%22%2C%228_90102_13%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "- [`The opposite of \"small\" is \"` → `big`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-small-big&pinnedIds=17_17185_8%2C17_16548_8%2C17_96470_8%2C13_39546_8%2C12_12865_8%2C9_71406_5%2C8_103482_5%2C8_5662_5%2C1_109683_5%2C2_45394_5%2CE_9181_5%2C13_36841_8%2C13_113219_8%2C15_121948_8%2C12_4243_8%2C11_16593_8%2C10_70326_8%2CE_330_8%2C7_59699_8%2C8_107081_8%2C5_94705_8%2C1_100176_8%2C9_52437_8%2C11_67476_8&supernodes=%5B%5B%22Output+%5C%22large%2Fbig%2Fhuge%5C%22%22%2C%2217_17185_8%22%2C%2217_16548_8%22%2C%2217_96470_8%22%5D%2C%5B%22upweight+big%2Flarge%2Fhuge%22%2C%2213_39546_8%22%2C%2212_12865_8%22%5D%2C%5B%22big%22%2C%228_103482_5%22%2C%222_45394_5%22%5D%2C%5B%22small%22%2C%221_109683_5%22%2C%228_5662_5%22%5D%2C%5B%22quotes%22%2C%2215_121948_8%22%2C%2213_36841_8%22%2C%2213_113219_8%22%2C%2210_70326_8%22%2C%228_107081_8%22%2C%2212_4243_8%22%2C%2211_16593_8%22%2C%229_52437_8%22%2C%221_100176_8%22%2C%227_59699_8%22%2C%225_94705_8%22%5D%2C%5B%22size%22%2C%2211_67476_8%22%2C%229_71406_5%22%5D%5D&clickedId=11_67476_8&pruningThreshold=0.7&densityThreshold=0.99)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we'll perform some imports, load our model, and define a helper function. For more details on this, and on interventions more generally, see `attribute_demo.ipynb` and `intervention_demo.ipynb`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import namedtuple\n",
    "from functools import partial\n",
    "\n",
    "import torch \n",
    "\n",
    "from circuit_tracer import ReplacementModel\n",
    "\n",
    "from utils import display_topk_token_predictions, display_generations_comparison\n",
    "from circuit_tracer.utils.decode_url_features import decode_url_features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "017425f2d0714753b711930a945c39d9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 16 files:   0%|          | 0/16 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded pretrained model meta-llama/Llama-3.2-1B into HookedTransformer\n"
     ]
    }
   ],
   "source": [
    "model = ReplacementModel.from_pretrained(\"meta-llama/Llama-3.2-1B\", \"llama\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "Feature = namedtuple('Feature', ['layer', 'pos', 'feature_idx'])\n",
    "\n",
    "# a display function that needs the model's tokenizer\n",
    "display_topk_token_predictions = partial(display_topk_token_predictions, tokenizer=model.tokenizer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Garden-Path Sentences\n",
    "\n",
    "This is the attribution graph for the prompt [`Because the manager retired the project` → `, / was`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-garden-path-nps&pinnedIds=17_11_6%2C17_574_6%2CE_2447_6%2C12_123901_6%2C12_105590_6%2C10_117209_4%2CE_22311_4%2C8_3291_1%2CE_18433_1%2C3_33599_1%2C2_50381_1%2C2_60861_1%2C3_9634_1%2C9_102504_6%2C10_117209_6&clickedId=9_102504_6&supernodes=%5B%5B%22because%22%2C%222_50381_1%22%2C%228_3291_1%22%2C%223_9634_1%22%2C%222_60861_1%22%2C%223_33599_1%22%5D%2C%5B%22subject+nouns%22%2C%2212_105590_6%22%2C%229_102504_6%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99&clerps=%5B%5B%2210_10117209_4%22%2C%22End+of+subordinate+clause%22%5D%2C%5B%2210_10117209_6%22%2C%22End+of+subordinate+clause%22%5D%2C%5B%2212_12123901_6%22%2C%22End+of+subordinate+clause%22%5D%5D). \n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/llama/gp-nps.png\" width=\"400\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a garden-path sentence; it has two possible readings:\n",
    "1. Because the manager retired the project[, all work on it was discontinued.]\n",
    "2. Because the manager retired[,] the project [was delayed by a month.]\n",
    "\n",
    "That is, the subordinate clause (\"Because...,\") either ends after \"project\" (1), or after \"retired\" (2). In case (1), \",\" is a likely continuation; in (2), \"was\" is more likely.\n",
    "\n",
    "We can find two features that correspond to ends of subordinate clauses in the late case (1) and the early case (2). Currently, the model prefers reading (1); we can make it prefer (2) by zeroing out the late subordinate clause feature and highly upweighting the early one. Crucially, the early feature does not directly upweight \"was\"; it isn't even at the final position."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "end_subord_early10 = Feature(layer=10,pos=4,feature_idx=117209)\n",
    "end_subord_late10 = Feature(layer=10,pos=6,feature_idx=117209)\n",
    "end_subord_late12 = Feature(layer=12,pos=6,feature_idx=123901)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Because the manager retired the project</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\",\">,</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.264</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">26.4%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" in\"> in</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.060</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 22%;\"></div>\n",
       "                                <span class=\"bar-text\">6.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" and\"> and</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.049</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 18%;\"></div>\n",
       "                                <span class=\"bar-text\">4.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" to\"> to</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.046</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 17%;\"></div>\n",
       "                                <span class=\"bar-text\">4.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" manager\"> manager</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.044</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 16%;\"></div>\n",
       "                                <span class=\"bar-text\">4.4%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" was\"> was</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.192</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 72%;\"></div>\n",
       "                                <span class=\"bar-text\">19.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" is\"> is</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.123</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 46%;\"></div>\n",
       "                                <span class=\"bar-text\">12.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\",\">,</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.063</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 23%;\"></div>\n",
       "                                <span class=\"bar-text\">6.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" manager\"> manager</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.049</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 18%;\"></div>\n",
       "                                <span class=\"bar-text\">4.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" has\"> has</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.038</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 14%;\"></div>\n",
       "                                <span class=\"bar-text\">3.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s = \"Because the manager retired the project\"\n",
    "_, activations = model.get_activations(s, sparse=True)\n",
    "intervention_tuples = [(*end_subord_early10, activations[end_subord_early10]*15),\n",
    "                       (*end_subord_late10, 0), (*end_subord_late12, 0)]\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s)\n",
    "\n",
    "    # in most examples, we freeze the attention, as this we assume it's static when performing attribution.\n",
    "    # but in this one, unfreezing the attention actually changes model behavior more\n",
    "    new_logits, _ = model.feature_intervention(s, intervention_tuples, freeze_attention=False)\n",
    "\n",
    "display_topk_token_predictions(s, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also view what sentences the model generates, and confirm that they match the expected readings pre- and post-intervention. That is, pre-intervention, the model should generate sentences where \"the project\" is the object of \"retired\". Post-intervention, the model should generate sentences where \"the project\" is the subject of a new clause."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .generations-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 12px;\n",
       "        font-size: 13px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .generations-viz .section-header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin: 10px 0 5px 0;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: block;\n",
       "    }\n",
       "    .generations-viz .pre-intervention-header {\n",
       "        background-color: #2471A3;\n",
       "    }\n",
       "    .generations-viz .post-intervention-header {\n",
       "        background-color: #27AE60;\n",
       "    }\n",
       "    .generations-viz .generation-container {\n",
       "        margin-bottom: 8px;\n",
       "        padding: 3px;\n",
       "        border-left: 3px solid rgba(100, 100, 100, 0.5);\n",
       "    }\n",
       "    .generations-viz .generation-text {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 6px 8px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        font-weight: 500;\n",
       "        white-space: pre-wrap;\n",
       "        line-height: 1.2;\n",
       "        font-size: 13px;\n",
       "        overflow-x: auto;\n",
       "    }\n",
       "    .generations-viz .base-text {\n",
       "        color: rgba(100, 100, 100, 0.9);\n",
       "    }\n",
       "    .generations-viz .new-text {\n",
       "        background-color: rgba(255, 255, 0, 0.25);\n",
       "        font-weight: bold;\n",
       "        padding: 1px 0;\n",
       "        border-radius: 2px;\n",
       "    }\n",
       "    .generations-viz .pre-intervention-item {\n",
       "        border-left-color: #2471A3;\n",
       "    }\n",
       "    .generations-viz .post-intervention-item {\n",
       "        border-left-color: #27AE60;\n",
       "    }\n",
       "    .generations-viz .generation-number {\n",
       "        font-weight: bold;\n",
       "        margin-bottom: 3px;\n",
       "        color: rgba(70, 70, 70, 0.9);\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"generations-viz\">\n",
       "    \n",
       "    <div class=\"section-header pre-intervention-header\">Pre-intervention generations:</div>\n",
       "    \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 1</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\">, I am now the project manager. I am</span></div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 2</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\">, he has no further involvement in the project.</span></div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 3</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\">, the user is now stuck with the old version</span></div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 4</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\">, the project manager is no longer available for questions</span></div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 5</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\">, we have been looking for a replacement. I</span></div>\n",
       "        </div>\n",
       "        \n",
       "    <div class=\"section-header post-intervention-header\">Post-intervention generations:</div>\n",
       "    \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 1</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\"> is no longer active.\n",
       "The project was active from</span></div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 2</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\">, I am the lead developer on the project now</span></div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 3</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\"> is being transferred to an in-house project manager.</span></div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 4</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\"> was not able to be completed. The project was</span></div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 5</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Because the manager retired the project</span><span class=\"new-text\"> was taken over by the team of Innovent</span></div>\n",
       "        </div>\n",
       "        \n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "pre_intervention_generation = [model.generate(s, do_sample=True, use_past_kv_cache=False, verbose=False, temperature=0.5) for _ in range(5)]\n",
    "post_intervention_generation = [model.feature_intervention_generate(s, intervention_tuples, do_sample=True, verbose=False, temperature=0.5)[0] for _ in range(5)]\n",
    "display_generations_comparison(s, pre_intervention_generation, post_intervention_generation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both interventions work! But note that the largest effect comes from upweighting the early subordinate clause end feature, rather than downweighting the later end features. We can show this by comparing the effects of intervening on each independently. Ablating just these late \"end subordinate clause\" features is not very effective."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Because the manager retired the project</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\",\">,</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.264</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">26.4%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" in\"> in</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.060</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 22%;\"></div>\n",
       "                                <span class=\"bar-text\">6.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" and\"> and</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.049</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 18%;\"></div>\n",
       "                                <span class=\"bar-text\">4.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" to\"> to</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.046</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 17%;\"></div>\n",
       "                                <span class=\"bar-text\">4.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" manager\"> manager</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.044</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 16%;\"></div>\n",
       "                                <span class=\"bar-text\">4.4%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\",\">,</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.217</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 82%;\"></div>\n",
       "                                <span class=\"bar-text\">21.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" in\"> in</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.064</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 24%;\"></div>\n",
       "                                <span class=\"bar-text\">6.4%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" and\"> and</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.052</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 19%;\"></div>\n",
       "                                <span class=\"bar-text\">5.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" to\"> to</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.048</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 18%;\"></div>\n",
       "                                <span class=\"bar-text\">4.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" manager\"> manager</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.047</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 17%;\"></div>\n",
       "                                <span class=\"bar-text\">4.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s = \"Because the manager retired the project\"\n",
    "_, activations = model.get_activations(s, sparse=True)\n",
    "intervention_tuples = [(*end_subord_late10, 0), (*end_subord_late12, 0)]\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s)\n",
    "    new_logits, _ = model.feature_intervention(s, intervention_tuples, freeze_attention=False)\n",
    "\n",
    "display_topk_token_predictions(s, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In contrast, intervening solely on the subordinate-clause-end feature at the early position is effective."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Because the manager retired the project</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\",\">,</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.264</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">26.4%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" in\"> in</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.060</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 22%;\"></div>\n",
       "                                <span class=\"bar-text\">6.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" and\"> and</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.049</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 18%;\"></div>\n",
       "                                <span class=\"bar-text\">4.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" to\"> to</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.046</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 17%;\"></div>\n",
       "                                <span class=\"bar-text\">4.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" manager\"> manager</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.044</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 16%;\"></div>\n",
       "                                <span class=\"bar-text\">4.4%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" was\"> was</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.173</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 65%;\"></div>\n",
       "                                <span class=\"bar-text\">17.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" is\"> is</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.102</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 38%;\"></div>\n",
       "                                <span class=\"bar-text\">10.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\",\">,</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.095</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 35%;\"></div>\n",
       "                                <span class=\"bar-text\">9.5%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" manager\"> manager</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.050</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 18%;\"></div>\n",
       "                                <span class=\"bar-text\">5.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" has\"> has</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.035</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 13%;\"></div>\n",
       "                                <span class=\"bar-text\">3.5%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s = \"Because the manager retired the project\"\n",
    "_, activations = model.get_activations(s, sparse=True)\n",
    "intervention_tuples = [(*end_subord_early10, activations[end_subord_early10]*10)]\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s)\n",
    "    new_logits, _ = model.feature_intervention(s, intervention_tuples, freeze_attention=False)\n",
    "\n",
    "display_topk_token_predictions(s, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Analogies about countries, languages, and currencies\n",
    "\n",
    "In this example, we'll play with variations on the prompt [`Mexico: peso :: China:` → `yuan`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-peso-yuan&clerps=%5B%5D&pinnedIds=17_73383_6%2C11_57318_6%2C13_125862_6%2C11_16313_6%2C12_116632_6%2C11_42240_6%2C11_121863_6%2C11_57318_5%2C12_82622_6%2C8_87687_5%2C4_123730_5%2C2_10764_5%2C4_36913_5%2C6_37297_5%2CE_5734_5%2C10_60314_6%2C9_18536_6%2C9_14974_6%2C8_62866_3%2C4_1248_3%2C10_28752_6%2C3_111633_3%2CE_38035_3%2C4_1248_4%2C2_128385_3%2C2_48203_3%2C1_72579_3%2C10_63648_6%2C9_42751_5%2C10_70313_5%2C2_52306_5%2C7_96333_5%2C5_109443_5%2C4_5554_5%2C3_85563_5%2C2_40398_5%2C0_124580_5&supernodes=%5B%5B%22currency%22%2C%228_62866_3%22%2C%224_1248_3%22%2C%223_111633_3%22%2C%222_48203_3%22%2C%221_72579_3%22%2C%222_128385_3%22%2C%224_1248_4%22%5D%2C%5B%22currency%22%2C%2211_42240_6%22%2C%229_18536_6%22%2C%2212_82622_6%22%2C%2212_116632_6%22%2C%2211_16313_6%22%2C%2210_60314_6%22%2C%229_14974_6%22%2C%2210_28752_6%22%5D%2C%5B%22China%22%2C%2210_63648_6%22%2C%2213_125862_6%22%2C%2211_121863_6%22%2C%2211_57318_6%22%5D%2C%5B%22China%22%2C%225_109443_5%22%2C%224_5554_5%22%2C%227_96333_5%22%2C%2210_70313_5%22%2C%222_52306_5%22%2C%229_42751_5%22%2C%2211_57318_5%22%2C%228_87687_5%22%2C%222_10764_5%22%2C%224_123730_5%22%2C%224_36913_5%22%2C%226_37297_5%22%2C%223_85563_5%22%2C%222_40398_5%22%2C%220_124580_5%22%5D%5D&clickedId=2_10764_5). We'll attempt to change the country and the topic of the analogy shown here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_peso_china = 'Mexico: peso :: China:'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One easy first example we can try: let's turn off the Chinese features in the this example, and see how output changes.\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/llama/peso-china.png\" width=\"400\">\n",
    "\n",
    "Rather than intervene on many features, and write them all out, we'll use this convenience function to extract them from the URL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "china_yuan_url = 'https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-peso-yuan&pinnedIds=17_73383_6%2C11_57318_6%2C13_125862_6%2C11_16313_6%2C12_116632_6%2C11_42240_6%2C11_121863_6%2C11_57318_5%2C12_82622_6%2C8_87687_5%2C4_123730_5%2C2_10764_5%2C4_36913_5%2C6_37297_5%2CE_5734_5%2C10_60314_6%2C9_18536_6%2C9_14974_6%2C8_62866_3%2C4_1248_3%2C10_28752_6%2C3_111633_3%2CE_38035_3%2C4_1248_4%2C2_128385_3%2C2_48203_3%2C1_72579_3%2C10_63648_6%2C9_42751_5%2C10_70313_5%2C2_52306_5%2C7_96333_5%2C5_109443_5%2C4_5554_5%2C3_85563_5%2C2_40398_5%2C0_124580_5&supernodes=%5B%5B%22currency%22%2C%228_62866_3%22%2C%224_1248_3%22%2C%223_111633_3%22%2C%222_48203_3%22%2C%221_72579_3%22%2C%222_128385_3%22%2C%224_1248_4%22%5D%2C%5B%22currency%22%2C%2211_42240_6%22%2C%229_18536_6%22%2C%2212_82622_6%22%2C%2212_116632_6%22%2C%2211_16313_6%22%2C%2210_60314_6%22%2C%229_14974_6%22%2C%2210_28752_6%22%5D%2C%5B%22China%22%2C%2210_63648_6%22%2C%2213_125862_6%22%2C%2211_121863_6%22%2C%2211_57318_6%22%5D%2C%5B%22China%22%2C%225_109443_5%22%2C%224_5554_5%22%2C%227_96333_5%22%2C%2210_70313_5%22%2C%222_52306_5%22%2C%229_42751_5%22%2C%2211_57318_5%22%2C%228_87687_5%22%2C%222_10764_5%22%2C%224_123730_5%22%2C%224_36913_5%22%2C%226_37297_5%22%2C%223_85563_5%22%2C%222_40398_5%22%2C%220_124580_5%22%5D%5D&clickedId=2_10764_5&pruningThreshold=0.7&densityThreshold=0.99'\n",
    "china_yuan_supernodes, _ = decode_url_features(china_yuan_url)\n",
    "chinese_features_china = china_yuan_supernodes['China (2)']  # These are at the \"China\" position"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Mexico: peso :: China:</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" yuan\"> yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.513</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">51.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" ren\"> ren</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.306</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 59%;\"></div>\n",
       "                                <span class=\"bar-text\">30.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Yuan\"> Yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.033</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 6%;\"></div>\n",
       "                                <span class=\"bar-text\">3.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Ren\"> Ren</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.022</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">2.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" R\"> R</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.019</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 3%;\"></div>\n",
       "                                <span class=\"bar-text\">1.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" yen\"> yen</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.223</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 43%;\"></div>\n",
       "                                <span class=\"bar-text\">22.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" yuan\"> yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.156</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 30%;\"></div>\n",
       "                                <span class=\"bar-text\">15.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" dollar\"> dollar</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.095</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 18%;\"></div>\n",
       "                                <span class=\"bar-text\">9.5%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" peso\"> peso</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.032</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 6%;\"></div>\n",
       "                                <span class=\"bar-text\">3.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" y\"> y</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.028</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 5%;\"></div>\n",
       "                                <span class=\"bar-text\">2.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "_, activations = model.get_activations(s_peso_china, sparse=True)\n",
    "logits = model(s_peso_china)\n",
    "\n",
    "interventions = [(*feature, 0) for feature in chinese_features_china]\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s_peso_china)\n",
    "    new_logits, new_activations = model.feature_intervention(s_peso_china, interventions)\n",
    "\n",
    "display_topk_token_predictions(s_peso_china, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This causes the position of the correct answer, \"yuan\", to fall, while \"yen\" takes its place. \n",
    "\n",
    "Can we replace \"yuan\" with another currency? We'll use the features from this circuit:\n",
    "[`Mexico: peso :: Thailand:` → `ba[ht]`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-peso-baht&pinnedIds=17_13081_6%2C17_14659_6%2C12_82622_6%2C11_16313_6%2C11_42240_6%2C11_63944_6%2C9_18536_6%2C11_38164_6%2C10_60314_6%2C11_116644_6%2C9_101986_6%2C8_127767_5%2C4_55772_5%2C9_101986_5%2C11_53983_6%2C9_14974_6%2C8_62866_3%2C10_28752_6%2C4_1248_3%2C3_111633_3%2CE_38035_3%2C3_122795_5%2C2_37737_5%2C2_18590_5%2CE_30567_5%2C2_75517_5%2C2_128385_3%2C2_48203_3%2C1_72579_3&supernodes=%5B%5B%22Output+%5C%22+ba%5Bht%5D%5C%22%22%2C%2217_14659_6%22%2C%2217_13081_6%22%5D%2C%5B%22Thailand%22%2C%2211_63944_6%22%2C%2211_116644_6%22%2C%229_101986_6%22%5D%2C%5B%22Thailand%2FSEA%22%2C%222_18590_5%22%2C%222_75517_5%22%2C%222_37737_5%22%2C%223_122795_5%22%2C%224_55772_5%22%2C%228_127767_5%22%2C%229_101986_5%22%5D%2C%5B%22currency%22%2C%229_14974_6%22%2C%229_18536_6%22%2C%2210_60314_6%22%2C%2211_16313_6%22%2C%2211_42240_6%22%2C%2210_28752_6%22%2C%2211_53983_6%22%2C%2212_82622_6%22%2C%2211_38164_6%22%5D%2C%5B%22currency%22%2C%222_48203_3%22%2C%221_72579_3%22%2C%222_128385_3%22%2C%228_62866_3%22%2C%224_1248_3%22%2C%223_111633_3%22%5D%5D&clickedId=2_48203_3&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/llama/peso-thailand.png\" width=\"400\">\n",
    "\n",
    "In this example, the analogy now asks for the currency of Thailand, the baht. If we turn off the China features, and turn on the Thailand features, the output currency should change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_peso_thailand = 'Mexico: peso :: Thailand:'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "thailand_baht_url = 'https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-peso-baht&pinnedIds=17_13081_6,17_14659_6,12_82622_6,11_16313_6,11_42240_6,11_63944_6,9_18536_6,11_38164_6,10_60314_6,11_116644_6,9_101986_6,8_127767_5,4_55772_5,9_101986_5,11_53983_6,9_14974_6,8_62866_3,10_28752_6,4_1248_3,3_111633_3,E_38035_3,3_122795_5,2_37737_5,2_18590_5,E_30567_5,2_75517_5,2_128385_3,2_48203_3,1_72579_3&supernodes=%5B%5B%22Output+%5C%22+ba%5Bht%5D%5C%22+(p=0.913)%22,%2217_14659_6%22,%2217_13081_6%22%5D,%5B%22Thailand%22,%2211_63944_6%22,%2211_116644_6%22,%229_101986_6%22%5D,%5B%22Thailand/SEA%22,%222_18590_5%22,%222_75517_5%22,%222_37737_5%22,%223_122795_5%22,%224_55772_5%22,%228_127767_5%22,%229_101986_5%22%5D,%5B%22currency%22,%229_14974_6%22,%229_18536_6%22,%2210_60314_6%22,%2211_16313_6%22,%2211_42240_6%22,%2210_28752_6%22,%2211_53983_6%22,%2212_82622_6%22,%2211_38164_6%22%5D,%5B%22currency%22,%222_48203_3%22,%221_72579_3%22,%222_128385_3%22,%228_62866_3%22,%224_1248_3%22,%223_111633_3%22%5D%5D&clickedId=2_48203_3&pruningThreshold=0.7&densityThreshold=0.99'\n",
    "thailand_baht_supernodes, _ = decode_url_features(thailand_baht_url)\n",
    "thai_features_thailand = thailand_baht_supernodes['Thailand/SEA']  # These are at the \"Thailand\" position"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Mexico: peso :: China:</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" yuan\"> yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.513</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 80%;\"></div>\n",
       "                                <span class=\"bar-text\">51.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" ren\"> ren</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.306</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 48%;\"></div>\n",
       "                                <span class=\"bar-text\">30.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Yuan\"> Yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.033</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 5%;\"></div>\n",
       "                                <span class=\"bar-text\">3.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Ren\"> Ren</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.022</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 3%;\"></div>\n",
       "                                <span class=\"bar-text\">2.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" R\"> R</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.019</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 2%;\"></div>\n",
       "                                <span class=\"bar-text\">1.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" ba\"> ba</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.634</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">63.4%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" yen\"> yen</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.041</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 6%;\"></div>\n",
       "                                <span class=\"bar-text\">4.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" dollar\"> dollar</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.018</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 2%;\"></div>\n",
       "                                <span class=\"bar-text\">1.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" peso\"> peso</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.015</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 2%;\"></div>\n",
       "                                <span class=\"bar-text\">1.5%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" yuan\"> yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.015</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 2%;\"></div>\n",
       "                                <span class=\"bar-text\">1.5%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "_, baht_activations = model.get_activations(s_peso_thailand, sparse=True)\n",
    "logits = model(s_peso_china)\n",
    "\n",
    "interventions = [(*feature, 0) for feature in chinese_features_china]\n",
    "interventions+= [(*feature, baht_activations[feature]) for feature in thai_features_thailand]\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s_peso_china)\n",
    "    new_logits, new_activations = model.feature_intervention(s_peso_china, interventions)\n",
    "\n",
    "display_topk_token_predictions(s_peso_china, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notably, this is much harder to do when intervening at the later, colon position. This is probably because, when we intervene at the earlier position, we influence all downstream features (and errors) at the colon position as well! So if we miss some upstream features, it's okay. But if we intervene at the colon position, we need to intervene on every relevant feature more or less independently - and even then, there will be some error nodes that push towards \"yuan\". To get to the desired outcome, we need to intervene strongly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# colon pos\n",
    "thai_features_colon = thailand_baht_supernodes['Thailand']  # These are at the \":\" position\n",
    "# colon pos\n",
    "chinese_features_colon = china_yuan_supernodes['China']  # These are at the \":\" position"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Mexico: peso :: China:</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" yuan\"> yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.513</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">51.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" ren\"> ren</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.306</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 59%;\"></div>\n",
       "                                <span class=\"bar-text\">30.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Yuan\"> Yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.033</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 6%;\"></div>\n",
       "                                <span class=\"bar-text\">3.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Ren\"> Ren</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.022</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">2.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" R\"> R</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.019</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 3%;\"></div>\n",
       "                                <span class=\"bar-text\">1.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" ba\"> ba</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.278</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 54%;\"></div>\n",
       "                                <span class=\"bar-text\">27.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" yuan\"> yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.229</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 44%;\"></div>\n",
       "                                <span class=\"bar-text\">22.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" t\"> t</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.024</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">2.4%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" ru\"> ru</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.022</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">2.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" r\"> r</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.018</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 3%;\"></div>\n",
       "                                <span class=\"bar-text\">1.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "interventions = [(*feature, 0) for feature in chinese_features_colon]\n",
    "interventions+= [(*feature, baht_activations[feature]) for feature in thai_features_colon]\n",
    "logits, new_activations = model.feature_intervention(s_peso_china, interventions)\n",
    "\n",
    "interventions = [(*feature, 0) for feature in chinese_features_colon]\n",
    "interventions+= [(*feature, baht_activations[feature]*2) for feature in thai_features_colon]\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s_peso_china)\n",
    "    new_logits, new_activations = model.feature_intervention(s_peso_china, interventions)\n",
    "\n",
    "display_topk_token_predictions(s_peso_china, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Other interventions are possible. What if we wanted to change the analogy type? Let's change the analogy form being about currency to being about languages. We'll the additional language features from this circuit.\n",
    "\n",
    "[`Mexico: Spanish :: China:` → `Chinese / Mandarin`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-Spanish-Chinese&clerps=%5B%5D&pinnedIds=17_8620_6%2C17_83871_6%2C11_105416_6%2C13_10254_6%2C12_57025_6%2C11_97996_6%2CE_15506_3%2C11_15631_6%2C9_58293_3%2C10_115816_6%2C2_75517_3%2C8_48318_3%2C0_66748_3%2C11_57318_6%2C8_87687_5%2C6_37297_5%2C4_123730_5%2C4_36913_5%2CE_5734_5%2C2_10764_5%2C13_125862_6%2C11_121863_6%2C8_74127_3%2C10_2257_6%2C10_2257_3%2C9_49585_3%2C5_6113_3%2C4_103160_3%2C3_28630_3%2C1_30074_3%2C2_88127_3%2C8_86003_3%2C8_124301_3%2C2_27793_3%2C4_2682_3%2C2_116265_3%2C2_106637_3%2C4_81660_3%2C3_16228_3%2C3_52190_3%2C0_17320_3%2C2_10130_1&supernodes=%5B%5B%22Output+%5C%22+Chinese%5C%22+%28p%3D0.198%29%22%2C%2217_8620_6%22%2C%2217_83871_6%22%5D%2C%5B%22China%22%2C%228_87687_5%22%2C%226_37297_5%22%2C%224_123730_5%22%2C%222_10764_5%22%2C%224_36913_5%22%5D%2C%5B%22China%22%2C%2211_121863_6%22%2C%2213_125862_6%22%2C%2211_57318_6%22%5D%2C%5B%22languages+%28upweight+English%2C+Spanish%29%22%2C%2210_115816_6%22%2C%2210_2257_6%22%2C%2211_105416_6%22%2C%2213_10254_6%22%2C%2211_97996_6%22%2C%2211_15631_6%22%2C%2212_57025_6%22%5D%2C%5B%22Spanish%22%2C%222_10130_1%22%2C%223_16228_3%22%2C%222_106637_3%22%2C%223_28630_3%22%2C%222_88127_3%22%2C%221_30074_3%22%2C%223_52190_3%22%2C%224_81660_3%22%5D%2C%5B%22language%22%2C%228_124301_3%22%2C%2210_2257_3%22%2C%229_49585_3%22%2C%224_103160_3%22%2C%225_6113_3%22%2C%228_86003_3%22%2C%222_27793_3%22%2C%220_17320_3%22%5D%2C%5B%22Countries%22%2C%224_2682_3%22%2C%222_116265_3%22%2C%228_74127_3%22%2C%229_58293_3%22%2C%222_75517_3%22%2C%228_48318_3%22%2C%220_66748_3%22%5D%5D&clickedId=Countries)\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/llama/spanish-china.png\" width=\"400\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_spanish_china = 'Mexico: Spanish :: China:'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# peso pos\n",
    "currency_features_peso = china_yuan_supernodes['currency']\n",
    "\n",
    "china_chinese_url = 'https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-Spanish-Chinese&pinnedIds=17_8620_6%2C17_83871_6%2C11_105416_6%2C13_10254_6%2C12_57025_6%2C11_97996_6%2CE_15506_3%2C11_15631_6%2C9_58293_3%2C10_115816_6%2C2_75517_3%2C8_48318_3%2C0_66748_3%2C11_57318_6%2C8_87687_5%2C6_37297_5%2C4_123730_5%2C4_36913_5%2CE_5734_5%2C2_10764_5%2C13_125862_6%2C11_121863_6%2C8_74127_3%2C10_2257_6%2C10_2257_3%2C9_49585_3%2C5_6113_3%2C4_103160_3%2C3_28630_3%2C1_30074_3%2C2_88127_3%2C8_86003_3%2C8_124301_3%2C2_27793_3%2C4_2682_3%2C2_116265_3%2C2_106637_3%2C4_81660_3%2C3_16228_3%2C3_52190_3%2C0_17320_3%2C2_10130_1&supernodes=%5B%5B%22Output+%5C%22+Chinese%5C%22+%28p%3D0.198%29%22%2C%2217_8620_6%22%2C%2217_83871_6%22%5D%2C%5B%22China%22%2C%228_87687_5%22%2C%226_37297_5%22%2C%224_123730_5%22%2C%222_10764_5%22%2C%224_36913_5%22%5D%2C%5B%22China%22%2C%2211_121863_6%22%2C%2213_125862_6%22%2C%2211_57318_6%22%5D%2C%5B%22languages+%28upweight+English%2C+Spanish%29%22%2C%2210_115816_6%22%2C%2210_2257_6%22%2C%2211_105416_6%22%2C%2213_10254_6%22%2C%2211_97996_6%22%2C%2211_15631_6%22%2C%2212_57025_6%22%5D%2C%5B%22Spanish%22%2C%222_10130_1%22%2C%223_16228_3%22%2C%222_106637_3%22%2C%223_28630_3%22%2C%222_88127_3%22%2C%221_30074_3%22%2C%223_52190_3%22%2C%224_81660_3%22%5D%2C%5B%22language%22%2C%228_124301_3%22%2C%2210_2257_3%22%2C%229_49585_3%22%2C%224_103160_3%22%2C%225_6113_3%22%2C%228_86003_3%22%2C%222_27793_3%22%2C%220_17320_3%22%5D%2C%5B%22Countries%22%2C%224_2682_3%22%2C%222_116265_3%22%2C%228_74127_3%22%2C%229_58293_3%22%2C%222_75517_3%22%2C%228_48318_3%22%2C%220_66748_3%22%5D%5D&clickedId=Countries&pruningThreshold=0.7&densityThreshold=0.99'\n",
    "china_chinese_supernodes, _ = decode_url_features(china_chinese_url)\n",
    "language_features_spanish = china_chinese_supernodes['language']  # These are at the \"Spanish\" position"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Mexico: peso :: China:</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" yuan\"> yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.513</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">51.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" ren\"> ren</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.306</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 59%;\"></div>\n",
       "                                <span class=\"bar-text\">30.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Yuan\"> Yuan</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.033</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 6%;\"></div>\n",
       "                                <span class=\"bar-text\">3.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Ren\"> Ren</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.022</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">2.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" R\"> R</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.019</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 3%;\"></div>\n",
       "                                <span class=\"bar-text\">1.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Mandarin\"> Mandarin</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.159</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 31%;\"></div>\n",
       "                                <span class=\"bar-text\">15.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Chinese\"> Chinese</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.157</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 30%;\"></div>\n",
       "                                <span class=\"bar-text\">15.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" mand\"> mand</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.090</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 17%;\"></div>\n",
       "                                <span class=\"bar-text\">9.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" English\"> English</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.058</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 11%;\"></div>\n",
       "                                <span class=\"bar-text\">5.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" chinese\"> chinese</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.032</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 6%;\"></div>\n",
       "                                <span class=\"bar-text\">3.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "_, language_activations = model.get_activations(s_spanish_china, sparse=True)\n",
    "logits = model(s_peso_china)\n",
    "\n",
    "interventions = [(*feature, 0) for feature in currency_features_peso]\n",
    "interventions+= [(*feature, language_activations[feature]*5) for feature in language_features_spanish]\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s_peso_china)\n",
    "    new_logits, new_activations = model.feature_intervention(s_peso_china, interventions)\n",
    "\n",
    "\n",
    "display_topk_token_predictions(s_peso_china, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, we have to push the features a little, multiplying the language activations quite strongly in order to counteract the currency features."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Feature generalization across contexts\n",
    "\n",
    "In the previous examples, we had paired inputs, identical in structure except for one or two crucial words, that we used to identify features. Can we do this in a more challenging context, re-using features between examples that are not paired? \n",
    "We'll try to manipulate the model on this input: [`Mexico: Spanish :: US:` → `English`](<https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-Spanish-English&clerps=[[%2212020491%22,%22UK/English%22],[%2213010254%22,%22languages+(upweight+English)%22],[%2211015631%22,%22language%22],[%2212057025%22,%22language%22]]&pinnedIds=17_6498_6%2C11_105416_6%2C9_58293_3%2C10_2257_3%2C9_49585_3%2C5_6113_3%2CE_15506_3%2C10_813_6%2CE_2326_5%2C2_37574_5%2C4_75078_5%2C13_10254_6%2C11_97996_6%2C12_57025_6%2C12_20491_6%2C11_15631_6%2C0_74386_5&supernodes=[[%22language%22,%225_6113_3%22,%229_58293_3%22,%229_49585_3%22,%2210_2257_3%22],[%22languages+(upweight+English)%22,%2211_97996_6%22,%2211_105416_6%22],[%22US%22,%220_74386_5%22,%2210_813_6%22,%222_37574_5%22,%224_75078_5%22],[%22language%22,%2211_15631_6%22,%2212_57025_6%22]]&clickedId=11_15631_6&pruningThreshold=0.7&densityThreshold=0.99>).\n",
    "\n",
    "The input has this circuit and features:\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/llama/spanish-us.png\" width=\"400\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "US_features = [\n",
    "\tFeature(layer=1, pos=5, feature_idx=66436),\n",
    "\tFeature(layer=0, pos=5, feature_idx=74386),\n",
    "\tFeature(layer=10, pos=6, feature_idx=813),\n",
    "\tFeature(layer=2, pos=5, feature_idx=37574),\n",
    "\tFeature(layer=4, pos=5, feature_idx=75078)\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll try to use the Danish features from this sentence to manipulate the answer into \"Danish\", rather than \"English\": [`Zagreb:Croatia :: Copenhagen:` → `Denmark`](https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-croatia-denmark&pinnedIds=17_24539_9%2C17_35440_9%2C11_121939_9%2C10_89787_9%2C8_62013_8%2C7_21695_8%2C12_79838_9%2C13_99984_9%2C14_3293_9%2C13_50480_9%2C14_95110_9%2C15_95091_9%2CE_64161_8%2C5_29307_8%2C2_58224_8%2C1_75211_8%2C3_99868_8%2C11_129365_9%2C11_61135_9%2C11_121939_8%2C12_57025_9%2C11_69725_9%2C11_69725_8%2C9_39864_9%2C12_127919_9%2C10_108437_9%2C10_76757_9%2C9_85946_9%2CE_25_9%2C8_48318_6%2C2_75517_6%2C1_74922_6%2C3_17431_6%2C2_71193_6%2CE_18283_5%2CE_689_6%2C0_88159_6%2C1_37886_6%2C11_90929_9%2C10_89787_8%2C12_127919_8%2C9_39864_8%2C11_129365_8&supernodes=%5B%5B%22Scandinavia%22%2C%2215_95091_9%22%2C%2213_99984_9%22%2C%2214_3293_9%22%2C%2211_121939_9%22%2C%2211_129365_9%22%5D%2C%5B%22Commas+separating+cities+and+states%22%2C%2210_108437_9%22%2C%2211_61135_9%22%2C%2210_76757_9%22%2C%229_85946_9%22%5D%2C%5B%22Countries%22%2C%222_71193_6%22%2C%228_48318_6%22%2C%223_17431_6%22%2C%221_74922_6%22%2C%220_88159_6%22%2C%221_37886_6%22%2C%222_75517_6%22%5D%2C%5B%22upweight+demonyms%22%2C%2212_57025_9%22%2C%2211_90929_9%22%5D%2C%5B%22Scandinavia%22%2C%2210_89787_9%22%2C%2211_129365_8%22%2C%2210_89787_8%22%2C%2211_121939_8%22%2C%228_62013_8%22%2C%227_21695_8%22%2C%225_29307_8%22%2C%223_99868_8%22%2C%221_75211_8%22%2C%222_58224_8%22%5D%2C%5B%22Denmark%22%2C%2211_69725_8%22%2C%229_39864_8%22%2C%2212_127919_8%22%5D%2C%5B%22Denmark%22%2C%229_39864_9%22%2C%2214_95110_9%22%2C%2211_69725_9%22%2C%2212_127919_9%22%2C%2212_79838_9%22%2C%2213_50480_9%22%5D%2C%5B%22Output+%5C%22Denmark%5C%22%22%2C%2217_35440_9%22%2C%2217_24539_9%22%5D%5D&clickedId=17_24539_9&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/llama/zagreb-copenhagen.png\" width=\"400\">\n",
    "\n",
    "We'll use the features at the \"Copenhagen\" position."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "denmark_url = 'https://www.neuronpedia.org/llama-3.2-1b/graph?slug=llama-croatia-denmark&pinnedIds=17_24539_9%2C17_35440_9%2C11_121939_9%2C10_89787_9%2C8_62013_8%2C7_21695_8%2C12_79838_9%2C13_99984_9%2C14_3293_9%2C13_50480_9%2C14_95110_9%2C15_95091_9%2CE_64161_8%2C5_29307_8%2C2_58224_8%2C1_75211_8%2C3_99868_8%2C11_129365_9%2C11_61135_9%2C11_121939_8%2C12_57025_9%2C11_69725_9%2C11_69725_8%2C9_39864_9%2C12_127919_9%2C10_108437_9%2C10_76757_9%2C9_85946_9%2CE_25_9%2C8_48318_6%2C2_75517_6%2C1_74922_6%2C3_17431_6%2C2_71193_6%2CE_18283_5%2CE_689_6%2C0_88159_6%2C1_37886_6%2C11_90929_9%2C10_89787_8%2C12_127919_8%2C9_39864_8%2C11_129365_8&supernodes=%5B%5B%22Scandinavia%22%2C%2215_95091_9%22%2C%2213_99984_9%22%2C%2214_3293_9%22%2C%2211_121939_9%22%2C%2211_129365_9%22%5D%2C%5B%22Commas+separating+cities+and+states%22%2C%2210_108437_9%22%2C%2211_61135_9%22%2C%2210_76757_9%22%2C%229_85946_9%22%5D%2C%5B%22Countries%22%2C%222_71193_6%22%2C%228_48318_6%22%2C%223_17431_6%22%2C%221_74922_6%22%2C%220_88159_6%22%2C%221_37886_6%22%2C%222_75517_6%22%5D%2C%5B%22upweight+demonyms%22%2C%2212_57025_9%22%2C%2211_90929_9%22%5D%2C%5B%22Scandinavia%22%2C%2210_89787_9%22%2C%2211_129365_8%22%2C%2210_89787_8%22%2C%2211_121939_8%22%2C%228_62013_8%22%2C%227_21695_8%22%2C%225_29307_8%22%2C%223_99868_8%22%2C%221_75211_8%22%2C%222_58224_8%22%5D%2C%5B%22Denmark%22%2C%2211_69725_8%22%2C%229_39864_8%22%2C%2212_127919_8%22%5D%2C%5B%22Denmark%22%2C%229_39864_9%22%2C%2214_95110_9%22%2C%2211_69725_9%22%2C%2212_127919_9%22%2C%2212_79838_9%22%2C%2213_50480_9%22%5D%2C%5B%22Output+%5C%22Denmark%5C%22%22%2C%2217_35440_9%22%2C%2217_24539_9%22%5D%5D&clickedId=17_24539_9&pruningThreshold=0.7&densityThreshold=0.99'\n",
    "denmark_supernodes, _ = decode_url_features(denmark_url)\n",
    "\n",
    "# features at the Copenhagen position\n",
    "scandinavia_features = denmark_supernodes['Scandinavia (2)']\n",
    "denmark_features = denmark_supernodes['Denmark']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Mexico: Spanish :: US:</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" English\"> English</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.320</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 82%;\"></div>\n",
       "                                <span class=\"bar-text\">32.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Spanish\"> Spanish</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.259</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 66%;\"></div>\n",
       "                                <span class=\"bar-text\">25.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Mexican\"> Mexican</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.025</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 6%;\"></div>\n",
       "                                <span class=\"bar-text\">2.5%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" French\"> French</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.021</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 5%;\"></div>\n",
       "                                <span class=\"bar-text\">2.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" \"> </td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.019</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">1.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Swedish\"> Swedish</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.389</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">38.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" English\"> English</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.138</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 35%;\"></div>\n",
       "                                <span class=\"bar-text\">13.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Spanish\"> Spanish</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.127</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 32%;\"></div>\n",
       "                                <span class=\"bar-text\">12.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Danish\"> Danish</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.037</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 9%;\"></div>\n",
       "                                <span class=\"bar-text\">3.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Norwegian\"> Norwegian</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.027</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 6%;\"></div>\n",
       "                                <span class=\"bar-text\">2.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s_denmark = \"Zagreb:Croatia :: Copenhagen:\"\n",
    "s_US = \"Mexico: Spanish :: US:\"\n",
    "_, denmark_activations = model.get_activations(s_denmark, sparse=True)\n",
    "logits = model(s_US)\n",
    "\n",
    "interventions = [(*feature, 0) for feature in US_features]\n",
    "interventions+= [(feature.layer, 5, feature.feature_idx, denmark_activations[feature]*5) for feature in scandinavia_features + denmark_features]\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s_US)\n",
    "    new_logits, new_activations = model.feature_intervention(s_US, interventions)\n",
    " \n",
    "\n",
    "display_topk_token_predictions(s_US, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that rather than Danish, Swedish wins out! But the broader concept of Scandinavia does come through."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
